NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction

Kerr, Justin; Kim, Chung_Min; Wu, Mingxuan; Yi, Brent; Wang, Qianqian; Goldberg, Ken; Kanazawa, Angjoo (November 2024, Conference on Robot Learning)

Humans can learn to manipulate new objects by simply watching others; providing robots with the ability to learn from such demonstrations would enable a natural interface specifying new behaviors. This work develops Robot See Robot Do (RSRD), a method for imitating articulated object manipulation from a single monocular RGB human demonstration given a single static multi-view object scan. We first propose 4D Differentiable Part Models (4D-DPM), a method for recovering 3D part motion from a monocular video with differentiable rendering. This analysis-by-synthesis approach uses part-centric feature fields in an iterative optimization which enables the use of geometric regularizers to recover 3D motions from only a single video. Given this 4D reconstruction, the robot replicates object trajectories by planning bimanual arm motions that induce the demonstrated object part motion. By representing demonstrations as part-centric trajectories, RSRD focuses on replicating the demonstration's intended behavior while considering the robot's own morphological limits, rather than attempting to reproduce the hand's motion. We evaluate 4D-DPM's 3D tracking accuracy on ground truth annotated 3D part trajectories and RSRD's physical execution performance on 9 objects across 10 trials each on a bimanual YuMi robot. Each phase of RSRD achieves an average of 87% success rate, for a total end-to-end success rate of 60% across 90 trials. Notably, this is accomplished using only feature fields distilled from large pretrained vision models -- without any task-specific training, fine-tuning, dataset collection, or annotation.
more » « less
Full Text Available
GARField: Group Anything with Radiance Fields

Kim, Chung_Min; Wu, Mingxuan; Kerr, Justin; Goldberg, Ken; Tancik, Matthew; Kanazawa, Angjoo (June 2024, CVPR)

Grouping is inherently ambiguous due to the multiple levels of granularity in which one can decompose a scene -- should the wheels of an excavator be considered separate or part of the whole? We present Group Anything with Radiance Fields (GARField), an approach for decomposing 3D scenes into a hierarchy of semantically meaningful groups from posed image inputs. To do this we embrace group ambiguity through physical scale: by optimizing a scale-conditioned 3D affinity feature field, a point in the world can belong to different groups of different sizes. We optimize this field from a set of 2D masks provided by Segment Anything (SAM) in a way that respects coarse-to-fine hierarchy, using scale to consistently fuse conflicting masks from different viewpoints. From this field we can derive a hierarchy of possible groupings via automatic tree construction or user interaction. We evaluate GARField on a variety of in-the-wild scenes and find it effectively extracts groups at many levels: clusters of objects, objects, and various subparts. GARField inherently represents multi-view consistent groupings and produces higher fidelity groups than the input SAM masks. GARField's hierarchical grouping could have exciting downstream applications such as 3D asset extraction or dynamic scene understanding. See the project website at https://www.garfield.studio/
more » « less
Full Text Available
Video Prediction Models as Rewards for Reinforcement Learning

Escontrela, Alejandro; Adeniji, Ademi; Yan, Wilson; Jain, Ajay; Peng, Xue Bin; Goldberg, Ken; Lee, Youngwoon Lee; Hafner, Danijar; Abbeel, Pieter (December 2023, Advances in neural information processing systems)

Full Text Available
AutoBag: Learning to Open Plastic Bags and Insert Objects

https://doi.org/10.1109/ICRA48891.2023.10161402

Chen, Lawrence Yunliang; Shi, Baiyu; Seita, Daniel; Cheng, Richard; Kollar, Thomas; Held, David; Goldberg, Ken (May 2023, IEEE)
All You Need is LUV: Unsupervised Collection of Labeled Images Using UV-Fluorescent Markings

https://doi.org/10.1109/IROS47612.2022.9981768

Thananjeyan, Brijen; Kerr, Justin; Huang, Huang; Gonzalez, Joseph E.; Goldberg, Ken (October 2022, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS))

Full Text Available
FogROS 2: An Adaptive and Extensible Platform for Cloud and Fog Robotics Using ROS 2

Ichnowski, Jeffrey; Chen, Kaiyuan; Dharmarajan, Karthik; Adebola, Simeon; Danielczuk, Michael; Mayoral-Vilches, V; Zhan, Hugo; Xu, Derek; Ghassemi, Ramtin; Kubiatowicz, John; et al (May 2023, Proceedings IEEE International Conference on Robotics and Automation)

Mobility, power, and price points often dictate that robots do not have sufficient computing power on board to run contemporary robot algorithms at desired rates. Cloud computing providers such as AWS, GCP, and Azure offer immense computing power on demand, but tapping into that power from a robot is non-trivial. We present FogROS2, an open-source platform to facilitate cloud and fog robotics that is compatible with the emerging Robot Operating System 2 (ROS 2) standard. FogROS2 is completely redesigned and distinct from its predecessor FogROS1 in 9 ways, and has lower latency, overhead, and startup times; improved usability, and additional automa-tion, such as region and computer type selection. Additionally, FogROS2 was added to the official distribution of ROS 2, gaining performance, timing, and additional improvements associated with ROS 2. In examples, FogROS2 reduces SLAM latency by 50 %, reduces grasp planning time from 14 s to 1.2 s, and speeds up motion planning 28x. When compared to FogROS1, FogROS2 reduces network utilization by up to 3.8x, improves startup time by 63 %, and network round-trip latency by 97 %for images using video compression. The source code, examples, and documentation for FogROS2 are available at https://github.com/BerkeleyAutomation/FogROS2, and is available through the official ROS 2 repository at https://index.ros.org/p/fogros2/
more » « less
Full Text Available
DayDreamer: World Models for Physical Robot Learning

Wu, Philipp; Escontrela, Alejandro; Hafner, Danijar; Goldberg, Ken; Abbeel, Pieter (January 2022, Conference on Robot Learning)

Full Text Available
FogROS: An Adaptive Framework for Automating Fog Robotics Deployment

https://doi.org/10.1109/case49439.2021.9551628

Chen, Kaiyuan Eric; Liang, Yafei; Jha, Nikhil; Ichnowski, Jeffrey; Danielczuk, Michael; Gonzalez, Joseph; Kubiatowicz, John; Goldberg, Ken (August 2021, 2021 IEEE 17th International Conference on Automation Science and Engineering (CASE))

As many robot automation applications increasingly rely on multi-core processing or deep-learning models, cloud computing is becoming an attractive and economically viable resource for systems that do not contain high computing power onboard. Despite its immense computing capacity, it is often underused by the robotics and automation community due to lack of expertise in cloud computing and cloud-based infrastructure. Fog Robotics balances computing and data between cloud edge devices. We propose a software framework, FogROS, as an extension of the Robot Operating System (ROS), the de-facto standard for creating robot automation applications and components. It allows researchers to deploy components of their software to the cloud with minimal effort, and correspondingly gain access to additional computing cores, GPUs, FPGAs, and TPUs, as well as predeployed software made available by other researchers. FogROS allows a researcher to specify which components of their software will be deployed to the cloud and to what type of computing hardware. We evaluate FogROS on 3 examples: (1) simultaneous localization and mapping (ORB-SLAM2), (2) Dexterity Network (Dex-Net) GPU-based grasp planning, and (3) multi-core motion planning using a 96-core cloud-based server. In all three examples, a component is deployed to the cloud and accelerated with a small change in system launch configuration, while incurring additional latency of 1.2 s, 0.6 s, and 0.5 s due to network communication, the computation speed is improved by 2.6x, 6.0x and 34.2x, respectively.
more » « less
Full Text Available
AVPLUG: Approach Vector PLanning for Unicontact Grasping amid Clutter

https://doi.org/10.1109/CASE49439.2021.9551652

Avigal, Yahav; Satish, Vishal; Tam, Zachary; Huang, Huang; Zhang, Harry; Danielczuk, Michael; Ichnowski, Jeffrey; Goldberg, Ken (January 2021, IEEE International Conference on Automation Science and Engineering (CASE))

Mechanical search, the finding and extracting of a known target object from a cluttered environment, is a key challenge in automating warehouse, home, retail, and industrial tasks. In this paper, we consider contexts in which occluding objects are to remain untouched, thus minimizing disruptions and avoiding toppling. We assume a 6-DOF robot with an RGBD camera and unicontact suction gripper mounted on its wrist. With this setup, the robot can move both camera and gripper in order to identify a suitable approach vector, reach in to achieve a suction grasp of the target object, and extract it. We present AVPLUG: Approach Vector PLanning for Unicontact Grasping, an algorithm that uses an octree occupancy model and Minkowski sum computation to find a collision-free grasp approach vector. Experiments in simulation and with a physical Fetch robot suggest that AVPLUG finds an approach vector up to 20× faster than a baseline search policy.
more » « less
Full Text Available
Policy Gradient Bayesian Robust Optimization for Imitation Learning

Javed, Zaynah; Brown, Daniel; Sharma, Satvik; Zhu, Jerry; Balakrishna, Ashwin; Petrik, Marek; Dragan, Anca; Goldberg, Ken (January 2021, 38th International Conference on Machine Learning)

The difficulty in specifying rewards for many real world problems has led to an increased focus on learning rewards from human feedback, such as demonstrations. However, there are often many different reward functions that explain the human feedback, leaving agents with uncertainty over what the true reward function is. While most policy optimization approaches handle this uncertainty by optimizing for expected performance, many applications demand risk-averse behavior. We derive a novel policy gradient-style robust optimization approach, PG-BROIL, that optimizes a soft-robust objective that balances expected performance and risk. To the best of our knowledge, PG-BROIL is the first policy optimization algorithm robust to a distribution of reward hypotheses which can scale to continuous MDPs. Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator’s reward function.
more » « less
Full Text Available

« Prev Next »

Search for: All records